Static index pruning in web search engines
نویسندگان
چکیده
منابع مشابه
Modeling Static Caching in Web Search Engines
In this paper we model a two-level cache of a Web search engine, such that given memory resources, we find the optimal split fraction to allocate for each cache, results and index. The final result is very simple and implies to compute just five parameters that depend on the input data and the performance of the search engine. The model is validated through extensive experimental results and is...
متن کاملEntropy-Based Static Index Pruning
We propose a new entropy-based algorithm for static index pruning. The algorithm computes an importance score for each document in the collection based on the entropy of each term. A threshold is set according to the desired level of pruning and all postings associated with documents that score below this threshold are removed from the index, i.e. documents are removed from the collection. We c...
متن کاملA five-level static cache architecture for web search engines
Caching is a crucial performance component of large-scale web search engines, as it greatly helps reducing average query response times and query processing workloads on backend search clusters. In this paper, we describe a multi-level static cache architecture that stores five different item types: query results, precomputed scores, posting lists, precomputed intersections of posting lists, an...
متن کامل3D Inverted Index with Cache Sharing for Web Search Engines
Web search engines achieve efficient performance by partitioning and replicating the indexing data structure used to support query processing. Current practice simply partitions and replicates the text collection on the set of cluster processors and then constructs in each processor an index data structure. This paper proposes a different approach by constructing an index data structure that pr...
متن کاملWeb Mining in Search Engines
Given the rate of growth of the Web, scalability of search engines is a key issue, as the amount of hardware and network resources needed is large, and expensive. In addition, search engines are popular tools, so they have heavy constraints on query answer time. So, the efficient use of resources can improve both scalability and answer time. One tool to achieve these goals is Web mining. Web mi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Information Systems
سال: 2012
ISSN: 1046-8188,1558-2868
DOI: 10.1145/2094072.2094074